Left-to-Right Hierarchical Phrase-based Machine Translation

نویسنده

  • Maryam Siahbani
چکیده

Hierarchical phrase-based translation (Hiero for short) models statistical machine translation (SMT) using a lexicalized synchronous context-free grammar (SCFG) extracted from word aligned bitexts. The standard decoding algorithm for Hiero uses a CKY-style dynamic programming algorithm with time complexity O(n3) for source input with n words. Scoring target language strings using a language model in CKY-style decoding requires two histories per hypothesis making it significantly slower than phrase-based translation which only keeps one history per hypothesis. In addition, the size of a Hiero SCFG grammar is typically much larger than phrase-based models when extracted from the same data which also slows down decoding. In this thesis we address these issues in Hiero by adopting a new translation model and decoding algorithm called Left-to-Right hierarchical phrase-based translation (LR-Hiero for short). LR-Hiero uses a constrained form of lexicalized SCFG rules to encode translation, where the target-side is constrained to be prefix-lexicalized. LRHiero uses a decoding algorithm with time complexity O(n2) that generates the target language output in left-to-right manner which keeps only one history per hypothesis resulting in faster decoding for Hiero grammars. The thesis contains the following contributions: • We propose a novel dynamic programming algorithm for the rule extraction phase. Unlike traditional Hiero rule extraction which performs a brute-force search, LR-Hiero rule extraction is linear in the number of rules. • We propose an augmented version of LR-decoding algorithm previously proposed by Watanabe et al. [90]. Our modified LR-decoding algorithm addresses issues related to decoding time and translation quality and is shown to be more efficient than the CKY decoding algorithm in our experimental results. • We extend our LR-decoding algorithm to capture all hierarchical phrasal alignments that are reachable in CKY-style decoding algorithms. • We introduce a lexicalized reordering model to LR-Hiero that significantly improves the translation quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation

Left-to-right (LR) decoding (Watanabe et al., 2006) is promising decoding algorithm for hierarchical phrase-based translation (Hiero) that visits input spans in arbitrary order producing the output translation in left to right order. This leads to far fewer language model calls, but while LR decoding is more efficient than CKY decoding, it is unable to capture some hierarchical phrase alignment...

متن کامل

Left-to-Right Target Generation for Hierarchical Phrase-Based Translation

We present a hierarchical phrase-based statistical machine translation in which a target sentence is efficiently generated in left-to-right order. The model is a class of synchronous-CFG with a Greibach Normal Form-like structure for the projected production rule: The paired target-side of a production rule takes a phrase prefixed form. The decoder for the targetnormalized form is based on an E...

متن کامل

A Markov Model of Machine Translation using Non-parametric Bayesian Inference

Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phraseinternal translation and reordering. However phrase-based approaches are much less able to model sentence level effects between different phrase-pairs. We propose a new model to address this imbalance, based on a word-based Markov model of translation which generates target tr...

متن کامل

Bidirectional Phrase-based Statistical Machine Translation

This paper investigates the effect of direction in phrase-based statistial machine translation decoding. We compare a typical phrase-based machine translation decoder using a left-to-right decoding strategy to a right-to-left decoder. We also investigate the effectiveness of a bidirectional decoding strategy that integrates both mono-directional approaches, with the aim of reducing the effects ...

متن کامل

Expressive Hierarchical Rule Extraction for Left-to-Right Translation

Left-to-right (LR) decoding Watanabe et al. (2006) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero) that visits input spans in arbitrary order producing the output translation in left to right order. This leads to far fewer language model calls. But the constrained SCFG grammar used in LR-Hiero (GNF) with at most two non-terminals is unable to account for some...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016